Intermediate Representation
   HOME

TheInfoList



OR:

An intermediate representation (IR) is the
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
or code used internally by a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
or
virtual machine In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
to represent
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
. An IR is designed to be conducive to further processing, such as
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
and
translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
. A "good" IR must be ''accurate'' – capable of representing the source code without loss of information – and ''independent'' of any particular source or target language. An IR may take one of several forms: an in-memory
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
, or a special
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
- or
stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...
-based
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
readable by the program. In the latter case it is also called an ''intermediate language''. A canonical example is found in most modern compilers. For example, the CPython interpreter transforms the linear human-readable text representing a program into an intermediate graph structure that allows
flow analysis Flow may refer to: Science and technology * Fluid flow, the motion of a gas or liquid * Flow (geomorphology), a type of mass wasting or slope movement in geomorphology * Flow (mathematics), a group action of the real numbers on a set * Flow (psy ...
and re-arrangement before execution. Use of an intermediate representation such as this allows compiler systems like the
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
and
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
to be used by many different source languages to generate code for many different target
architectures Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings o ...
.


Intermediate language

An intermediate language is the language of an
abstract machine An abstract machine is a computer science theoretical model that allows for a detailed and precise analysis of how a computer system functions. It is analogous to a mathematical function in that it receives inputs and produces outputs based on pre ...
designed to aid in the analysis of
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
s. The term comes from their use in
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s, where the source code of a program is translated into a form more suitable for code-improving transformations before being used to generate
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
or
machine A machine is a physical system using Power (physics), power to apply Force, forces and control Motion, movement to perform an action. The term is commonly applied to artificial devices, such as those employing engines or motors, but also to na ...
code for a target machine. The design of an intermediate language typically differs from that of a practical
machine language In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
in three fundamental ways: * Each instruction represents exactly one fundamental operation; ''e.g.'' "shift-add"
addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in ...
s common in
microprocessors A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
are not present. *
Control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''imper ...
information may not be included in the instruction set. * The number of
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
s available may be large, even limitless. A popular format for intermediate languages is
three-address code In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most three operands and is ty ...
. The term is also used to refer to languages used as intermediates by some
high-level programming language In computer science, a high-level programming language is a programming language with strong Abstraction (computer science), abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ...
s which do not output object or machine code themselves, but output the intermediate language only. This intermediate language is submitted to a compiler for such language, which then outputs finished object or machine code. This is usually done to ease the process of
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
or to increase portability by using an intermediate language that has compilers for many
processors A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, ...
and
operating systems An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also inc ...
, such as C. Languages used for this fall in complexity between high-level languages and low-level languages, such as
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
s.


Languages

Though not explicitly designed as an intermediate language, C's nature as an abstraction of
assembly Assembly may refer to: Organisations and meetings * Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions * General assembly, an official meeting of the members of an organization or of their representa ...
and its ubiquity as the de facto system language in
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
and other operating systems has made it a popular intermediate language:
Eiffel Eiffel may refer to: Places * Eiffel Peak, a summit in Alberta, Canada * Champ de Mars – Tour Eiffel station, Paris, France; a transit station Structures * Eiffel Tower, in Paris, France, designed by Gustave Eiffel * Eiffel Bridge, Ungheni, M ...
,
Sather Sather is an object-oriented programming language. It originated circa 1990 at the International Computer Science Institute (ICSI) at the University of California, Berkeley, developed by an international team led by Steve Omohundro. It supports ...
,
Esterel Esterel is a synchronous programming language for the development of complex reactive systems. The imperative programming style of Esterel allows the simple expression of parallelism and preemption. As a consequence, it is well suited for contr ...
, some
dialect The term dialect (from Latin , , from the Ancient Greek word , 'discourse', from , 'through' and , 'I speak') can refer to either of two distinctly different types of Linguistics, linguistic phenomena: One usage refers to a variety (linguisti ...
s of
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
( Lush,
Gambit A gambit (from Italian , the act of tripping someone with the leg to make them fall) is a chess opening in which a player sacrifices with the aim of achieving a subsequent advantage. The word '' gambit'' is also sometimes used to describe sim ...
),
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
(
Glasgow Haskell Compiler The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell. It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, ...
),
Squeak Squeak is an object-oriented, class-based, and reflective programming language. It was derived from Smalltalk-80 by a group that included some of Smalltalk-80's original developers, initially at Apple Computer, then at Walt Disney Imagineering, ...
's Smalltalk-subset Slang, Nim,
Cython Cython () is a programming language that aims to be a superset of the Python programming language, designed to give C-like performance with code that is written mostly in Python with optional additional C-inspired syntax. Cython is a compiled ...
,
Seed7 Seed7 is an extensible general-purpose programming language designed by Thomas Mertes. It is syntactically similar to Pascal and Ada. Along with many other features, it provides an extension mechanism. Daniel Zingaro"Modern Extensible Languages" ...
, SystemTap,
Vala Vala or VALA may refer to: Religion and mythology * Vala (Vedic), a demon or a stone cavern in the Hindu scriptures * Völva, also spelled Vala, a priestess in Norse mythology and Norse paganism Fiction * Vala (Middle-earth), an angelic being in ...
, V, and others make use of C as an intermediate language. Variants of C have been designed to provide C's features as a portable
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence be ...
, including
C-- C-- (pronounced ''C minus minus'') is a C (programming language), C-like programming language. Its creators, functional programming researchers Simon Peyton Jones and Norman Ramsey, designed it to be generated mainly by compilers for very high-l ...
and the
C Intermediate Language George Ciprian Necula is a Romanian computer scientist, engineer at Google, and former professor at the University of California, Berkeley who does research in the area of programming languages and software engineering, with a particular focus on ...
. Any language targeting a
virtual machine In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
or
p-code machine In computer programming, a p-code machine (portable code machine) is a virtual machine designed to execute ''p-code'' (the assembly language or machine code of a hypothetical central processing unit (CPU)). This term is applied both generically t ...
can be considered an intermediate language: *
Java bytecode In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, ...
* Microsoft's
Common Intermediate Language Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. ...
is an intermediate language designed to be shared by all compilers for the
.NET Framework The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
, before static or dynamic compilation to machine code. * While most intermediate languages are designed to support statically typed languages, the
Parrot intermediate representation The Parrot intermediate representation (PIR), previously called Intermediate code (IMC), is one of the two assembly languages for the Parrot virtual machine. The other is Parrot assembly language or PASM. Compared to PASM, PIR exists at a slightly ...
is designed to support dynamically typed languages—initially Perl and Python. *
TIMI The Thrombolysis In Myocardial Infarction, or TIMI Study Group, is an Academic Research Organization (ARO) affiliated with Brigham and Women's Hospital and Harvard Medical School dedicated to advancing the knowledge and care of patients with car ...
is used by compilers on the
IBM i IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS in ...
platform. *
O-code BCPL ("Basic Combined Programming Language") is a procedural, imperative, and structured programming language. Originally intended for writing compilers for other languages, BCPL is no longer in common use. However, its influence is still ...
for
BCPL BCPL ("Basic Combined Programming Language") is a procedural, imperative, and structured programming language. Originally intended for writing compilers for other languages, BCPL is no longer in common use. However, its influence is still ...
*
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
precompiled code *
Microsoft P-Code In computer programming, a p-code machine (portable code machine) is a virtual machine designed to execute ''p-code'' (the assembly language or machine code of a hypothetical central processing unit (CPU)). This term is applied both generically t ...
*
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
p-code Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software Interpreter (computing), interpreter. Unlike Human-readable code, human-readable source code, bytecodes are compact nume ...
The
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
(GCC) uses several intermediate languages internally to simplify portability and
cross-compilation A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on an Android smartphone is a cross ...
. Among these languages are * the historical
Register Transfer Language In computer science, register transfer language (RTL) is a kind of intermediate representation (IR) that is very close to assembly language, such as that which is used in a compiler. It is used to describe data flow at the register-transfer lev ...
(RTL) * the tree language
GENERIC Generic or generics may refer to: In business * Generic term, a common name used for a range or class of similar things not protected by trademark * Generic brand, a brand for a product that does not have an associated brand or trademark, other ...
* the SSA-based GIMPLE. (Lower-level than GENERIC; input for most optimizers; has a compact "bytecode" notation.) GCC supports generating these IRs, as a final target: * HSA Intermediate Layer *
LLVM Intermediate Representation LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represe ...
(converted from GIMPLE in the now-defunct llvm-gcc which uses LLVM optimizers and codegen) The
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
compiler framework is based on the
LLVM IR LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
intermediate language, of which the compact, binary serialized representation is also referred to as "bitcode" and has been productized by Apple. Like GIMPLE Bytecode, LLVM Bitcode is useful in link-time optimization. Like GCC, LLVM also targets some IRs meant for direct distribution, including Google's
PNaCl Google Native Client (NaCl) is a discontinued sandbox (computer security), sandboxing technology for running either a subset of Intel x86, ARM architecture, ARM, or MIPS architecture, MIPS native code, or a portable executable, in a sandbox. It ...
IR and SPIR. A further development within LLVM is the use of ''Multi-Level Intermediate Representation'' (MLIR) with the potential to generate code for different heterogeneous targets, and to combine the outputs of different compilers. The ILOC intermediate language is used in classes on compiler design as a simple target language."CISC 471 Compiler Design"
by Uli Kremer


Other

Static analysis Static analysis, static projection, or static scoring is a simplified analysis wherein the effect of an immediate change to a system is calculated without regard to the longer-term response of the system to that change. If the short-term effect i ...
tools often use an intermediate representation. For instance,
radare2 Radare2 (also known as r2) is a complete framework for reverse-engineering and analyzing binaries; composed of a set of small utilities that can be used together or independently from the command line. Built around a disassembler for computer soft ...
is a toolbox for binary files analysis and reverse-engineering. It uses the intermediate languages ESIL and REIL to analyze binary files.


See also

*
Interlingual machine translation Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representa ...
*
Pivot language A pivot language, sometimes also called a bridge language, is an artificial or natural language used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translate ...
*
Abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
*
Bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
(Intermediate code) *
Symbol table In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier (or symbols), constants, procedures and functions in a program's source code is associated with info ...
*
Source-to-source compiler A source-to-source translator, source-to-source compiler (S2S compiler), transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent sou ...
*
Graph rewriting In computer science, graph transformation, or graph rewriting, concerns the technique of creating a new graph out of an original graph algorithmically. It has numerous applications, ranging from software engineering (software construction and also ...
and
term rewriting In mathematics, computer science, and logic, rewriting covers a wide range of methods of replacing subterms of a formula with other terms. Such methods may be achieved by rewriting systems (also known as rewrite systems, rewrite engines, or reduc ...
*
UNCOL UNCOL (Universal Computer Oriented Language) is a universal intermediate language for compilers. The idea was introduced in 1958, by a SHARE ad-hoc committee. It was never fully specified or implemented; in many ways it was more a concept than a l ...


References


External links


The Stanford SUIF Group
{{DEFAULTSORT:Intermediate language Compiler construction Programming language classification